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Abstract 

Recent  tests  for  the  convergence  hypothesis  derive  from  regressing  average  growth  rates  on 
initial  levels:  a  negative  initiid  level  coefficient  is  interpreted  as  convergence.  These  tests 
turn  out  to  be  plagued  by  Francis  Gahon's  classical  fallacy  of  regression  towards  the  mean. 
Using  a  dynamic  version  of  Gallon's  fallacy,  we  establish  that,  in  fact,  coefficients  of  arbi- 
trary signs  in  such  regressions  are  consistent  with  an  unchanging  cross-section  distribution 
of  incomes. 


Feathers  bit  the  ground  before  their  weight  can  leave  the  air. 

R.E.M. 

1.  Introduction 

Do  the  incomes  or  productivity  levels  of  different  economies  have  a  tendency  to  converge?  Numerous 
researchers  have  recently  examined  this  issue  by  "calculating  the  cross-section  regression  of  measured 
grovrth  rates  on  initial  levels.  See  for  instance  Barro  (1989),  Baumol  (1986),  DeLong  (1988),  Dowrick 
and  Nguyen  (1989),  Murphy,  Shleifer,  and  Vishny  (1990),  and  many  others.  Murphy,  Shleifer,  and 
Vishny  have  called  this  the  "Barro  regression."  Evidently,  in  the  Barro  regression,  a  negative 
coefficient  on  initial  levels  is  taken  to  indicate  convergence. 

This  paper  clarifies  what  such  initial  level  regressions  are  able  to  uncover.    As  used  in  this 
literature,  the  term  "convergence"  can  mean  a  number  of  different  things: 

(a)  Countries  originally  richer  than  average  are  more  likely  to  turn  below  average  eventually,  and 
vice  versa;  the  cycle  repeats; 

(b)  Whether  a  country  income  is  eventually  above  or  below  average  is  independent  of  that  economy's 
original  position; 

(c)  Income  disparities  between  countries  have  neither  unit  roots  nor  deterministic  time  trends;  and 
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(d)  Each  country  eventually  becomes  as  rich  as  all  the  others;  the  cross-section  dispersion  diminishes 
over  time. 

Cases  (a)  and  (b)  vaguely  correspond  to  the  notion  of  mixing  in  econometrics  (see  e.g.  White 
(1984)).  Case  (c)  is  one  formulation  of  persistence  in  income  disparities:  from  a  time-series  perspec- 
tive, it  is  the  natural  way  to  examine  dependence  on  initial  conditions.  This  particular  probability 
model  raises  interesting  econometric  issues  in  the  context  of  unit  root  random  fields  (see  Quah 
(1990a));  it  is,  however,  quite  different  in  spirit  and  in  substance  from  initial  level  regressions.  Case 
(d)  is  closest  to  the  notion  of  poorer  countries  eventually  catching  up  with  richer  countries. 

If  (a)  and  (b)  are  the  cases  of  interest,  models  for  studying  transitional  characteristics — for 
example,  that  used  in  the  income  distribution  and  earnings  mobility  literature — would  seem  appro- 
priate. Thus,  Quah  (1990b)  attempts  to  uncover  such  effects  in  the  context  of  heterogeneous  Markov 
chains.  Overall,  however,  the  work  using  initial  levels  regressions  strongly  suggests  case  (d)  as  being 
of  interest. 

This  paper  shows  that  the  widely-used  initial  level  regressions,  in  fact,  shed  no  light  on  con- 
vergence in  the  sense  of  (d).  I  develop  an  analogy  between  those  regressions  and  Galton's  classical 
fallacy  of  regression  towards  the  mean.  Recall  that  Galton,  in  his  aristocratic  manner,  was  concerned 
about  the  sons  of  tall  fathers  regressing  into  a  pool  of  mediocrity  along  with  the  sons  of  everyone 
else.  Galton  inferred  this  from  observing  that  taller-than-average  fathers  had  sons  who  turned  out 
to  be  not  as  much  above  average  as  the  fathers  themselves.  However,  he  could  not  reconcile  this  with 
the  fact  that  the  observed  population  of  male  heights  continued  to  display  significant  cross-section 
dispersion.  I  show — using  exactly  the  same  reasoning  that  reveals  Galton's  error — that  a  negative 
cross-section  regression  coefficient  on  initial  levels  is,  in  fact,  perfectly  consistent  with  absence  of 
convergence  in  the  sense  of  (d). 

While  Galton's  formulation  is  convenient  for  analyzing  observations  at  two  points  in  time,  it 
offers  little  by  way  of  interesting  dynamics.  Extending  the  analysis  to  permit  such  dynamics,  I  show 
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in  Section  3  below  that  a  given  cross-section  distribution — replicating  itself  over  time — is  consistent 
with  arbitrary  signs  on  the  cross-section  initial  levels  regression  coefficient.    In  other  words,  the 
sign  of  the  initial  levels  regression  coefficient  says  nothing  about  whether  there  is  convergence  or 
divergence. 

The  final  two  sections  below  consider  alternative  probability  models  that  might  justify  these 
initial  levels  regressions.  We  will  see,  however,  that  there  are  significant  econometric  difficulties  in 
interpreting  the  estimation  results  from  such  models. 

2.   Galton's  Fallacy  for  the  Convergence  Hypothesis 

To  make  the  point  clearly,  consider  the  simplest  case.  Let  Yj{t)  denote  (the  logarithm  of)  measured 
per  capita  income  or  productivity  in  country  j  and  period  t.  For  tj  different  from  <i ,  the  cross-section 
regression  of  y (is)  on  a  constant  and  Y{ti)  is: 

P[Yih)  I  1,  Y{t:)]  =  EcY{i,)  +  X  •  {Y{U)  -  EcY{h)),  (1) 

where 

A  =  Varc(y(<i))-'Covc(V(<2),y(<i)). 

In  (1),  P[  I  ]  and  Ec,  Varc,  and  Co\c  indicate  projection  and  cross-section  exjjectation,  variance, 
and  covariance  respectively.  Suppose  that  there  is  no  convergence,  i.e., 

Varc(y(<i))  =  Varc(y(<2)). 

The  Cauchy-Schwarz  inequality  immediately  implies  that  the  regression  coefficient  A  is  less  than  1 
in  absolute  value.  This  of  course  is  simply  the  Gallon  fallacy:  economies  with  higher  than  average 
incomes  at  <i  (tall  parents)  have  incomes  that  are  not  as  high  above  average  at  t^  (offspring  regressing 
towards  mediocrity).  Note  that  this  happens  exactly  when  the  cross-section  variances  at  <i  and  <2 
are  equal,  i.e.,  when  there  is  no  convergence  of  cross-section  incomes. 
Equation  (1)  then  implies: 

p[Y{u,)  -  y(ti)  1 1,  y(ii)]  =  /i  -  (1  -  A)  •  Yiu)  (2) 
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for  some  fi  and  A  <  1.  By  the  last  inequality,  the  croes-Bection  regression  coefficient  on  y(<i)  in  (2)  is 
non-positive.  In  words,  when  <i  <  tj,  a  regression  of  income  growth  on  initial  levels  shows  countries 
that  are  initially  richer  tend  to  grow  more  slowly.  Scaling  the  dependent  variable  by  (<2  —  *i)~S 
i.e.,  using  average  income  growth  on  the  left  hand  side,  does  not  alter  this  conclusion.  Again,  this 
apparent  convergence  occurs  when  there  is,  by  assumption,  no  real  convergence. 

3.  Dynamics  and  Arbitrary  Signs  on  the  Initial  Level 

The  classical  Galton  fallacy  above  is  useful  for  analyzing  observations  made  at  two  points  in  time. 
When  Yj  has  interesting  dynamics,  it  turns  out  that  an  initial  levels  regression  can  give  either  a 
strictly  positive  or  a  strictly  negative  coefficient,  even  when  the  cross-section  distribution  remains 
unchanging  over  time. 

While  the  point  can  be  made  quite  generally,  again  it  is  instructive  to  take  the  simplest  case.  For 
each  t,  let  {Yj{t),  j  =  1,2,3,. ..}  be  independent,  and  let  Gt  denote  the  cross-section  distribution 
at  time  t: 

(^t{y)  =  fraction  of  j  such  that  V}(t)  <  y,         y  in  71. 

Suppose  further  that  for  each  j,  the  time  series  {Yj{t),  i  =  ...,—1,0,1,...}  is  zero-mean  and 
stationary,  and  has  finite  variance  and  normally  distributed  innovations;  let 


X,(0  =  i;C(5)e,(<-5),         e,~A^(0,<7^), 


j  =  0 


be  the  Wold  representation  for  Yj .  Call  u^  =  a^  J2.  \C{s)\'^-  Then 

=  F{y) 

is  the  unique  ergodic  distribution  for  the  stochastic  process  {Yj{t),  t  =  .. .,  —1,0, 1, .. .}  for  each  j. 
Assume  the  number  of  countries  is  large  and  initialize  the  cross-section  distribuiions  Gt,  <  <  0,  to 
equal  the  time-series  ergodic  distribution  F.  At  each  <  >  1,  take  Gt-,,  s  >  1,  as  given  and  apply 
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the  Glivenko-Cantelli  Lemma;  it  follows  immediately  that 

G,+i  =  G,  =  Co  =  F.         <  >  1. 

In  words,  the  assumptions  imply  that  the  cross-section  distribution  of  countries  is  time-invariant. 
The  cross-section  initial  levels  regression  (with  Iq  <  *i  <  '2)  >s  now: 

P[Y{i,)-YiU)\l,  Yiio)]  =  (  +  PYito),  (3) 

for  some  (and 


P  =  9yW  {grih  -  to)  -  griti  -  to)). 


with  gy  denoting  the  covariogram  of  Fj .  (Barro  [1989]  and  Murphy,  Shleifer,  and  Vishny  [1990]  have 
considered  exactly  this  configuration  in  (to.'ii'z))  Because  the  cross-section  distribution  matches 
the  ergodic  distribution,  the  cross-section  covariances  exactly  equal  the  corresponding  unconditional 
time  series  moments.  Notice  that  while  gy{0)  >  0  and  gyi^)  — '  0  as  i  — >  00,  intermediate  values 
of  gy  are  unrestricted.  Thus,  the  regression  coefficient  0  on  Y{io)  can  take  arbitrary  sign.  If,  for 
instance,  fj  — ►  00,  0  simply  has  the  opposite  sign  as  gy{i\  —  <o)- 

In  sununary,  the  initial  levels  regression  coefficient  has  a  sign  that  is  completely  uninformative 
for  whether  the  cross-section  distribution  is  converging  or  diverging.  In  the  example  above,  the 
cross-section  distribution  is  unchanging  over  time,  yet  the  sign  of  the  regression  coefficient  can  be 
negative,  positive,  or  zero. 

The  independence  and  identical  distribution  assumptions  here  play  a  role  only  in  simplifying 
the  calculations.  With  heterogeneity-,  the  time-invariant  cross-section  distribution  is  a  probability 
mixture  of  the  different  individual  lime  series  ergodic  distributions.  Weak  forms  of  dependence 
across  countries  will  not  affect  application  of  the  Glivenko-Cantelli  law.  With  strong  dependence  or 
small  numbers  of  countries,  the  cross-section  distribution  will  be  a  non-degenerate  random  element 
in  the  space  of  distributions.  While  the  calculations  then  become  much  more  difficult,  the  flavor  of 
the  results  is  unaffected.  Finally,  normality  is  used  only  to  give  an  explicit  form  to  the  individual 
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time  series  ergodic  distribution. 

4.  A  Possibly  Correct  Formulation 

A  dynamic  panel  data  random  effects  model  might  be  thought  to  justify  the  usual  interpretation  of 
these  initial  levels  regressions.  We  will  see,  however,  that  there  are  serious  econometric  difhculties 
in  this  view. 

Suppose  y}(t),  ;■=  1,2,  ...,A^,  <  =  0,1,... r,  is  generated  by 

Yj{t)  =  A'ji(0  +  Xjoit)  =  Qj  +  e^t  +  Xjoii),         EAXjo  =  0 

=^AYj{t)  =  ej  +  AXio{t),     EAXjo  =  0,  (4) 

and 

Bj  =  Zj  /?o  +  tij ,     Euj  Zj=Q.  (5) 

The  zero  expectation  conditions  in  (4)  and  (5)  are  identifying  assumptions.  Equation  (4)  states  that 
country  j's  (log)  income  Yj  is  comprised  of  two  components  Xjx  and  A'^o-  In  the  current  work,  Xji 
is  taken  to  be  just  a  time  trend  Xji(t)  =  qj  +  6jt.  Equation  (5)  is  a  regression  that  describes  how 
growth  rates  6j  vary  across  countries.  Notice  that  6j  is  the  growth  rate  of  both  the  unobserved 
component  Xji  and  the  observed  series  Yj  (since  EAXjo  =  0).  The  covariates  Zj  might  include 
measures  of  average  education,  health,  openness  of  the  economy,  as  well  as  the  initial  condition 
Yj{0). 

While  we  have  specified  Xji  to  be  a  time  trend,  more  generally  (Aji,A'jo)  could  be  simply 
a  decomposition  of  Yj  into  quite  arbitrary  stochastic  permanent  and  transitory  components  (as  in 
e.g.  Quah  (1990c)).  This,  however,  would  considerably  complicate  the  discussion  without  introducing 
any  new  insights. 

The  underlying  growth  rate  6j  is  unobservable  and  needs  to  be  proxied  in  estimating  (5).  One 
possibility  is  to  use: 

e:={h-ti)-'    J2    ^y:it)  =  {t,-h)-'{Yjii2)-Yj{ti)).  (6) 
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(Another  possibility  is  to  take  6j  to  be  the  time  trend  coefficient  in  a  least  squares  regression  of 
Yj  on  a  constant  and  time:  nothing  essential  would  change  in  the  discussion.)  Thus,  a  regression 
of  average  growth  rates  (6)  on  Zj  should  be  viewed  simply  as  an  imperfect  way  of  estimating  the 
underlying  regression  (5).  While  6j  is  an  error-ridden  measure  of  ^j,  it  appears  only  on  the  left-hand 
side  of  the  equation.  Thus,  it  might  seem  that  classical  regression  analysis  suggests  no  significant 
problems. 

In  fact,  however,  the  model  here  does  not  give  rise  to  classical  measurement  error.  Straightfor- 
ward calculation  shows  that  the  least  squares  regression  estimator  ps  computed  using  (6),  instead 
of  the  true  6j ,  satisfies: 

;=1  1=1 

[N-'J2^'Z;)~\N-''^J2z'^[Xio{i,)-Xi,{U)Yi,-U)-').  (7) 

Since  EZ'jUj  =  0  by  (5),  the  first  term  gives  the  standard  OLS  zero  mean  normal  distribution 
approximation  for  large  A'.  However,  since  neither  EZjXjo  nor  EZjAXjo  are  restricted,  the  second 
term  dominates  as  A'  grows  without  bound:  it  diverges  to  plus  or  minus  infinity.  Thus,  this  regression 
yields  an  inconsistent  estimator  for  /?o-'  This  effect  is  especially  pronounced  when  Zj  explains  both 
short-run  and  long-run  dynamics  in  Yj,  as  would  be  standard  in  real  business  cycle  models.  WTien 
Zj  is  the  initial  condition  ^j(O),  the  analysis  of  the  previous  section  again  applies.  Thus,  even  in 
this  setting,  it  is  difficult  to  interpret  the  results  of  initial  levels  regressions,  in  terms  of  convergence 
or  of  divergence. 


^  Some  might  argue  that  the  right  conceptual  experiment  in  (7)  is  to  take  i^  —  ti  — *  oo  and  then 
consider  the  approximation  as  A'  grows  large.  The  second  term  in  (7)  might  then  be  negligible.  This 
is  delicate:  recall  that  in  the  Summers-Heston  data  set — that  typically  used  in  this  work — i^  —  <i  is 
at  most  36  while  A''  is  about  100  so  that  N  "^  T  rather  than  the  other  way  round. 
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5.  An  Alternative  Interpretation 

An  alternative  interpretation  of  (3)  and  (4)-(5)  of  the  previous  sections  is  possible.  Some  unobserved 
common  factor  might  cause  both  high  growth  rates  6j  and  high  initial  levels  V}(<o)-  In  this  view, 
6j  and  Yj{io)  are  jointly  "endogenously"  determined:  (5)  is  a  reduced  form  for  some  unspecified 
underlying  structural  model.  This  avoids  interpreting  ^o  «*  *  structural  economic  parameter;  the 
sign  of  Po  in  (5)  nevertheless  remains  of  interest  as  that  is  thought  to  indicate  the  validity  of  this 
hypothesis. 

The  Gallon  fallacy  criticisms  in  Sections  3  and  4  are,  of  course,  unaffected  by  this  alternative 
interpretation.  The  initial  levels  regression  coefficient  says  nothing  about  (d)-convergence  regardless 
of  whether  /?o  is  part  of  the  structural  or  reduced  forms.  Significantly,  section  3  implies  that  a 
particular  sign  on  /?o  could  be  purely  spurious  from  the  viewpoint  of  the  hypothesis  here.  In  this 
sense,  any  empirical  finding  on  the  sign  of  Po  turns  out  to  be  not  especially  informative. 

6.  Conclusion 

We  have  shown  that  cross-section  regressions  of  growth  rates  on  initial  levels  shed  no  light  on 
the  validity  of  the  convergence  hypothesis  in  the  sense  of  (d).  It  should  be  evident  that  conditioning 
on  additional  regressors  does  not  alter  this  basic  message.  Having  clarified  this,  it  is  important  to 
emphasize  what  the  paper  does  not  say:  it  is  not  that  there  are  "econometric  problems"  in  estimating 
these  initial  level  regressions.  On  the  contrary,  in  every  situation  described  above,  except  for  the 
analysis  in  Section  4,  the  regressions  do  the  right  thing:  they  consistently  estimate  exactly  what  they 
are  supposed  to  estimate.  (The  model  of  Section  4  appears  to  come  closest  to  allowing  the  desired 
economic  interpretation;  there,  however,  we  find  significant  econometric  difficulties  in  interpreting 
the  results  from  the  estimation.  Even  if  these  problems  could  be  overcome,  those  raised  in  Sections 
2  and  3  would  nevertheless  remain.) 

The  difficulty  therefore  lies  not  in  the  econometrics  but  rather  in  the  economic  interpretation 
of  these  initial  levels  regressions:  subtlety  arises  because  researchers  have  provided  only  incomplete 
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probability  descriptions  of  the  effects  they  are  trying  to  uncover. 

It  would  be  useful  to  display  an  explicit  probability  model  where  such  cross-section  initial  levels 
regressions  are,  in  fact,  sensible  descriptive  devices.  Such  a  probability  description  would  help  clarify 
what  it  is  that  economists  are  using  different  growth  models  to  explain.  King  and  Robson  (1989)  is 
a  useful  step  in  this  direction. 

References 

Barro,  R.  J.,  1989,  A  Cross-Country  Study  of  Growth,  Saving,  and  Government,  NBER  Working  Paper 
2855,  February. 

Baumol,  W.,  1986,  Productivity  Growth,  Convergence,  and  Welfare,  American  Economic  Review,  76, 
no.  5,  December,  1072-85. 

DeLong,  J.  B.,  1988,  Productivity  Growth,  Convergence,  and  Welfare:  Comment,  Americ&n  Economic 
Review,  78,  no.  5,  1138-1155. 

Dowrick,  S.  and  D.  Nguyen,  1989,  OECD  Comparative  Economic  Growth  1950-85:  Catch-Up  and 
Convergence,  American  Economic  Review,  79,  no.  5,  1010-1030. 

King,  M.  A.  and  M.  H.  Robson,  1989,  "Endogenous  Growth  and  The  Role  of  History,"  LSE  FinanciaJ 
Markets  Group  Discussion  Paper  No.  63,  August. 

Murphy,  K.,  A.  Shleifer,  and  R.  Vishny,  1990,  The  Allocation  of  Talent:  Implications  for  Growth,  GSB 
Univ.  of  Chicago  Working  Paper,  March. 

Quah,  D.,  1990a,  Persistence  in  Income  Disparities;  I.  Unit  Root  Random  Fields,  MIT  mimeo,  March. 
Quah,  D.,  1990b,  Persistence  in  Income  Disparities:  II.  Heterogeneous  Markov  Chains  and  Duration 
Dependence,  MIT  mimeo,  in  preparation. 

Quah,  D.,  1990c,  Permanent  and  Transitory  Components  in  Labor  Income:  An  Explanation  for  'Excess 
Smoothness'  in  Consumption,  JournaJ  of  Political  Economy,  98,  no.  3,  449-475. 

White,  H.,  1984,  Asymptotic  Theory  for  Econometricians,  New  York:  Academic  Press. 


118!    028 


Date  Due 


MIT      IBRARIES 

lllli|i||  iii|iii||ii|i|ii|iii|nni|iii 


3  TOfiO  DObSflBTl  5 


